Data Mining as a Method for Linguistic Analysis: Dutch Diminutives*
نویسندگان
چکیده
We propose to use data mining techniques (inductive techniques for the automatic acquisition of comprehensible knowledge from data) as a method in linguistic analysis. In the past, such techniques have mainly been used in linguistic engineering applications to solve knowledge acquisition bottlenecks. In this paper we show that they can also assist in linguistic theory formation by providing a new tool for the evaluation of linguistic hypotheses, for the extraction of rules from corpora, and for the discovery of useful linguistic categories. By applying a rule induction method to a particular linguistic task (diminutive formation in Dutch) we show that data mining techniques can be used to test linguistic hypotheses about this morphological proces, and to discover interesting morphological and phonological rules and categories. * Preparation of this paper was supported by a Research Grant of the Fund for Joint Basic Research (FKFO 2.0101.94) of the National Fund for Scientific Research (NFWO) and by a VNC project of NFWO NWO (contract number G.2201.96), and a grant from the Research Council of the University of Antwerp. Linguistics as Data Mining 2
منابع مشابه
Diminutives facilitate word segmentation in natural speech: cross-linguistic evidence.
Final-syllable invariance is characteristic of diminutives (e.g., doggie), which are a pervasive feature of the child-directed speech registers of many languages. Invariance in word endings has been shown to facilitate word segmentation (Kempe, Brooks, & Gillis, 2005) in an incidental-learning paradigm in which synthesized Dutch pseudonouns were used. To broaden the cross-linguistic evidence fo...
متن کاملDOMAIN DATABASE KNOWLEDGE Incompleteness
There are several diierent ways data mining (the automatic induction of knowledge from data) can be applied to the problem of natural language processing. In the past, data mining techniques have mainly been used in linguistic engineering applications to solve knowledge acquisition bottlenecks. In this paper, we show that they can also assist in linguistic theory formation by providing a new to...
متن کاملDOMAIN DATABASE KNOWLEDGE Incompleteness Noise
There are several di erent ways data mining the automatic induction of knowledge from data can be applied to the problem of natural language processing In the past data mining techniques have mainly been used in linguistic engineering applications to solve knowledge acquisition bottlenecks In this paper we show that they can also assist in linguistic theory formation by providing a new tool for...
متن کاملBelgian Dutch versus Netherlandic Dutch: New patterns of divergence? On pronouns of address and diminutives
The linguistic climate in northern Belgium (Flanders) has been changing in recent years. A new corpus of spoken Dutch meets the need for data reflecting actual and present-day language use in this part of the Dutch language area. The ‘Spoken Dutch Corpus’ allows us to uncover and analyse the present state of colloquial Belgian Dutch and the changes which mark this condition. This paper discusse...
متن کاملDiminutives in child-directed speech supplement metric with distributional word segmentation cues.
In two experiments, we explored whether diminutives (e.g., birdie, Patty, bootie), which are characteristic of child-directed speech in many languages, aid word segmentation by regularizing stress patterns and word endings. In an implicit learning task, adult native speakers of English were exposed to a continuous stream of synthesized Dutch nonsense input comprising 300 randomized repetitions ...
متن کامل